Goto

Collaborating Authors

 single tree


Distilling interpretable causal trees from causal forests

Rehill, Patrick

arXiv.org Artificial Intelligence

Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.


When does Subagging Work?

Revelas, Christos, Boldea, Otilia, Werker, Bas J. M.

arXiv.org Machine Learning

We study the effectiveness of subagging, or subsample aggregating, on regression trees, a popular non-parametric method in machine learning. First, we give sufficient conditions for pointwise consistency of trees. We formalize that (i) the bias depends on the diameter of cells, hence trees with few splits tend to be biased, and (ii) the variance depends on the number of observations in cells, hence trees with many splits tend to have large variance. While these statements for bias and variance are known to hold globally in the covariate space, we show that, under some constraints, they are also true locally. Second, we compare the performance of subagging to that of trees across different numbers of splits. We find that (1) for any given number of splits, subagging improves upon a single tree, and (2) this improvement is larger for many splits than it is for few splits. However, (3) a single tree grown at optimal size can outperform subagging if the size of its individual trees is not optimally chosen. This last result goes against common practice of growing large randomized trees to eliminate bias and then averaging to reduce variance.


Predictive Density Combination Using a Tree-Based Synthesis Function

Chernis, Tony, Hauzenberger, Niko, Huber, Florian, Koop, Gary, Mitchell, James

arXiv.org Machine Learning

Bayesian predictive synthesis (BPS) provides a method for combining multiple predictive distributions based on agent/expert opinion analysis theory and encompasses a range of existing density forecast pooling methods. The key ingredient in BPS is a ``synthesis'' function. This is typically specified parametrically as a dynamic linear regression. In this paper, we develop a nonparametric treatment of the synthesis function using regression trees. We show the advantages of our tree-based approach in two macroeconomic forecasting applications. The first uses density forecasts for GDP growth from the euro area's Survey of Professional Forecasters. The second combines density forecasts of US inflation produced by many regression models involving different predictors. Both applications demonstrate the benefits -- in terms of improved forecast accuracy and interpretability -- of modeling the synthesis function nonparametrically.


LinXGBoost: Extension of XGBoost to Generalized Local Linear Models

de Vito, Laurent

arXiv.org Machine Learning

XGBoost is often presented as the algorithm that wins every ML competition. Surprisingly, this is true even though predictions are piecewise constant. This might be justified in high dimensional input spaces, but when the number of features is low, a piecewise linear model is likely to perform better. XGBoost was extended into LinXGBoost that stores at each leaf a linear model. This extension, equivalent to piecewise regularized least-squares, is particularly attractive for regression of functions that exhibits jumps or discontinuities. Those functions are notoriously hard to regress. Our extension is compared to the vanilla XGBoost and Random Forest in experiments on both synthetic and real-world data sets.


Ensemble Monte-Carlo Planning: An Empirical Study

Fern, Alan (Oregon State University) | Lewis, Paul (Oregon State University)

AAAI Conferences

Monte-Carlo planning algorithms, such as UCT, select actions at each decision epoch by intelligently expanding a single search tree given the available time and then selecting the best root action. Recent work has provided evidence that it can be advantageous to instead construct an ensemble of search trees and to make a decision according to a weighted vote. However, these prior investigations have only considered the application domains of Go and Solitaire and were limited in the scope of ensemble configurations considered. In this paper, we conduct a more exhaustive empirical study of ensemble Monte-Carlo planning using the UCT algorithm in a set of six additional domains. In particular, we evaluate the advantages of a broad set of ensemble configurations in terms of space and time efficiency in both parallel and singlecore models. Our results demonstrate that ensembles are an effective way to improve performance per unit time given a parallel time model and performance per unit space in a single-core model. However, contrary to prior isolated observations, we did not find significant evidence that ensembles improve performance per unit time in a single-core model.